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We study the conditional distribution of goodness of fit statistics of the Cramer-von Mises type 
given the complete sufficient statistics in testing for exponential family models. We show that 
this distribution is close, in large samples, to that given by parametric bootstrapping, namely, 
the unconditional distribution of the statistic under the value of the parameter given by the 
maximum likelihood estimate. As part of the proof, we give uniform Edgeworth expansions of 
Rao-Blackwell estimates in these models. 
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1. Introduction 

In this paper, we compare conditional and unconditional goodness-of-fit tests and give 
conditions under which the two give essentially identical results in large samples. Our 
results apply in testing fit for exponential family models for independent and identically 
distributed (i.i.d.) data, X\,... ,X n . Our interest is to test the null hypothesis that the 
distribution of the individual A, belongs to a natural exponential family with density, 
relative to some cr-finite measure, n(dx), on some sample space CI, of the form 

f(x;e) = c(x)cxp{9 / T(x)~ K (e)} (1) 

with natural parameter space O C K fe ; we assume that O has non-empty interior which 
we denote int(O). In (1), T takes values in R fe and superscript / denotes transposition. 
A complete and sufficient statistic for the parameter 6 is then 

n 

T„ = T„(Ai, . . . , X n ) = T{Xi). 

i=l 

To apply classical hypothesis testing ideas, we regard this model as a null hypothesis. 
We consider the omnibus alternative hypothesis that the sample is drawn from a distri- 
bution which is not in the parametric model. One common approach to this hypothesis 
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testing problem is to define some statistic S(X\, . . . , X n ; 9) which measures in some way 
departure of the sample from what is expected if 9 is the true value. Since 9 is unknown, 
it is replaced in this measure by 9 n , the maximum likelihood estimate of the parameter 
vector, leading to the statistic S„ = S(Xi, . . . , X n , 9„). 

Common examples include empirical distribution function statistics such as Cramer- 
von Mises, Kolmogorov-Smirnov, Anderson-Darling and many chi-squared statistics. 
The usual situation is that the test statistic has a distribution which depends, even in 
large samples, on the unknown parameter value (exceptions arise in the normal and other 
families which have only location and/or scale parameters). Thus, to implement the tests 
in practice it is necessary to specify how to compute critical points for the tests or how 
to compute appropriate P-values corresponding to the test statistics. A method long in 
use is to derive large sample theory for the statistic S n , establishing the convergence 
in distribution of S„ to some limiting distribution which depends on the true value 
of 9. If C a (9) is the upper a critical point of this limiting distribution and C Q depends 
continuously on 9, then the test which rejects if S n > C a (9 n ) has asymptotic level a. See 
Lockhart and Stephens [9] for a discussion of this method in testing fit for the von Mises 
distribution for directional data; this testing problem is discussed below in more detail. 

A more modern method which achieves the same asymptotic behaviour is the para- 
metric bootstrap. Let H n (-;9) denote the cumulative distribution function of S n when 
the true parameter value is 9. Then 

Pb = 1 — H n (S n ;9 n ) 

is the parametric bootstrap P-valuc. This P-valuc is usually computed approximately 
by generating some number, B, of bootstrap samples drawn from the density f(-,9 n ), 
computing the statistic S n for each of these B samples and then counting the fraction of 
these bootstrap statistic values which exceed the value of S n for the data set at hand. 

These two methods for goodness-of-fit testing both depend on asymptotic theory to 
justify their performance. They do not have, except in the location-scale situation men- 
tioned, exact level a and thus no exact finite sample optimality properties. Conditional 
tests, which we discuss next, offer at least the potential for such optimality. (Sec Remark 9 
in the Discussion section for some comments.) 

One standard approach (discussed in detail in [5] ) to optimality theory is to search for 
powerful unbiased level a tests: tests whose power never falls below a on the alternative. 
Such tests will generally have Neyman structure; that is, their level will be a everywhere 
on the boundary of the null hypothesis. For the omnibus alternative, this boundary is 
generally the entire model. 

Now suppose T„ is a complete sufficient statistic for this model. Then the requirement 
that the level of the test be a everywhere in the parametric model and completeness 
guarantee that the test must have conditional level a. That is, an unbiased level a 
test must have the property that the conditional probability of rejection given T„ is 
identically a. This is precisely the argument used in Lehmann and Romano [5] to show 
that Student's t test is uniformly most powerful unbiased. 

By a conditional test, then, we mean a test whose level, given the sufficient statis- 
tic T„, is identically a. Two recent papers on goodness-of-fit, Lockhart, O'Reilly and 
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Stephens [7, 8], have compared such conditional tests with parametric bootstrap tests. 
They implemented their conditional tests as follows. For a test statistic S n , let G„(-|-) 
denote the conditional distribution function, when the true distribution of the data comes 
from the exponential family, of S n given T„. This function G n does not depend on 9. If 
this conditional distribution function is continuous then 

P c = l-G n (S„|T„) 

has a uniform distribution under H a ; it is therefore an exact P- value. These P- values are 
often computed by Monte Carlo or Markov Chain Monte Carlo; see Lockhart, O'Reilly 
and Stephens [7, 8] for examples and references. 

In Lockhart, O'Reilly and Stephens [7], for instance, the authors considered an i.i.d. 
sample from the von Mises distribution. Observations Xj are points on the unit circle; 
see Section 4.2 below for details of the density. The complete sufficient statistic is T n = 
and the authors use Watson's U 2 statistic for S n . They use Markov Chain Monte 
Carlo methods to generated a sequence of samples from the conditional distribution of 
X\ , . . . , X n given T„ ; all the generated samples have the same value of T„ . The authors 
evaluate P c by computing U 2 for each data set and estimating P c by the fraction of 
samples giving larger values of U 2 than the original data sample. 

These authors also compute the parametric bootstrap value, P,, for the same statistic 
by generating i.i.d. samples from the von Mises distribution using, for the parameter 
value, the estimate of the parameter derived from the original data. Of course the val- 
ues of the sufficient statistic T„ vary from one bootstrap sample to another. Again U 2 
is computed for each bootstrap sample and a P value is computed as the fraction of 
bootstrap U 2 values which are larger than the observed value of U 2 . 

Very high correlations between the P-values computed using these two methods were 
observed in Lockhart, O'Reilly and Stephens [7]. For example, they considered a test 
that a sample of size 34 comes from a von Mises distribution. Using Watson's U 2 and 
generating samples from the null hypothesis they observed a correlation of 0.997 between 
the two P-values. For a sample of 55 observations, the correlation observed was 0.9997. 

Here we show that for statistics S n of the Cramer-von Mises type these two methods 
must give similar P-values because, when the null hypothesis is true, 

sup{|G n ( S |T„)-P„( S ;<?„)|}^0 

s 

in probability, at least when the model being tested is an exponential family. In fact, the 
convergence is almost sure for samples from any distribution for which 9 n /n converges 
almost surely to an interior point of the parameter space. For statistics S n which are 
sums of the form ^ i u n {X i ,0) this result is established by Hoist [4]. Our results extend 
his to statistics which we now describe. 

When il is the real line, many goodness-of-fit tests are based on statistics which are 
functional of the estimated empirical process 



W n (s) = y/H{F n (x)-F(x,O n )h 
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where we now use F(x,0) for the cumulative distribution function, s is related to x by 
s = F(x,9 n ) and F n is the usual empirical distribution function: 

n 

F n {x)=n- 1 J2HX l <x). 

i=l 

Common choices for statistics include: 
• Cramer-von Mises type: 

i 

2/\rxr2 



^(s)WZ{s)ds; (2) 



Watson type: 



W n {s)-j <p{u)W n {u)duj V (s)ds; 
• Kolmogorov-Smirnov type: 

sup |V(s)W„(s)|. 

0<s<l 

In each case, ift is some weight function defined on (0, 1). 

The large sample analysis of the unconditional distribution of such statistics comes 
from the well known weak convergence, in D[0,1], of the process W n to a Gaussian 
process, W , which we now describe. Let T{0) be the Fisher information matrix and 
define the column vector 

dF(x;0) 

where x is defined as a function of s by F(x, 8) = s. Then the limit process W has mean 
and covariance function 



pe(s, t) = minis, t}-st- £(s, 9)' {1(e)}- 1 ^(t, 6). 

The statistics indicated above are all continuous functionals of W n (under mild condi- 
tions on the weight functions involved) and as such converge in distribution to the same 
functional applied to the limit process W. See Stephens [14] for a detailed discussion of 
the resulting tests and Shorack and Wellner [12] for mathematical details. 

The weak convergence result can be proved in two steps: prove convergence in dis- 
tribution of the finite dimensional distributions of W n and then prove tightness of the 
sequence of processes in D[0, 1]. We believe a similar result holds, in exponential families, 
conditional on the sufficient statistic. Results in Hoist [4] can be used to establish conver- 
gence of the conditional finite dimensional distributions but we are unable to extend the 
calculations to prove conditional tightness. Instead we use Hoist's results and a trunca- 
tion argument to deal directly with statistics of the Cramer-von Mises or Watson types. 
Without tightness we cannot handle statistics of the Kolmogorov-Smirnov type. 
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Our truncation argument uses an accurate approximation to the conditional expecta- 
tion, given T„ , of the statistic in question. This approximation is based on an expansion 
of the difference between a Rao-Blackwell estimate and the corresponding maximum 
likelihood estimate. Our results here extend the work of Portnoy [11]. 

Section 2 gives precise statements of our results for the case of Cramer-von Mises 
statistics. Section 3 gives the expansion of the Rao-Blackwell estimate. Section 4 applies 
the calculations to two examples showing how to verify the main condition, Condition D 
below, and illustrating the expansions of Section 3. Section 5 provides some discussion 
and indicates the extension to Watson's statistic and other statistics which are quadratic 
functionals of the empirical distribution. In that section, we consider power and discuss 
various rcphrasings of our main result. Details of some proofs are in Section 6. 



2. Main results 

2.1. Absolutely continuous distributions 

We seek to test the hypothesis that the distribution of each Xi belongs to a natural 
exponential family with density, relative to some cr-finite measure fi(dx) on f2, of the 
form (1) and complete sufficient statistic T„ as described in the Introduction. We will 
need a number of well known facts about exponential families which we gather here in 
the form of a lemma. 

Lemma 1. The random vector T n has moment generating function 

E e [cxpj>'T„}] = exp[n{«;(0 + 0) - k(0)}] 
which is finite whenever 9 + <f> £ Oo, and cumulants nKi u i r where 

d r n{0) 



Hi ,...,l r 



dOi, ■ ■ ■ d6i. 



In particular, the mean of T„ is 

E fl (T n ) = np(0) = nV/s(0), 
where V is the gradient operator. The covariance matrix is 

Var e (T„) = nV{6) = nV 2 n{9), 
where V 2 denotes the Hessian operator. Thus, V(9) has entries 

Moreover, all moments and cumulants ofT n depend smoothly on 9 on the interior of Q. 
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Our results apply to exponential families where T„ has a density relative to Lebesgue 
measure. We assume the following condition. 

Condition D. For every compact subset T o/int(0) 7 there is an integer r such that the 
characteristic function 



Condition D has two consequences we need. First, it means the matrix Var#(Ti) = 
V 2 k(6*) is positive definite for each 9 £ int(0). This implies the map 9 H> n(9) = Vk(#) is 
an open bijective mapping of int(0) to /x(int(0)). A second consequence is that T„ has 
bounded continuous density for each 9 € T and n>r. In the examples it will be useful 
to know the converse is also true. The following lemma is essentially Theorem 19.1 in 
Bhattacharya and Ranga Rao [2], page 180; see also Lemma 6 in Section 6 below. 

Lemma 2. Condition D is equivalent to Condition D*. 

Condition D* . For every compact subset Y o/int(0) there is an integer r such that T r 
has continuous (Lebesgue) density f r (t;9) for each 9 Gf and 



As in the Introduction, we let G n (-\t) denote the conditional cumulative distribution 
function of S„ given T„ = t. Also let H n (-;9) denote the unconditional cumulative dis- 
tribution function of S„ when 9 is the value of the parameter. 

We will show that for statistics which are sums as in (3) below or of the Cramer- 
von Mises type these two cumulative distributions are uniformly close provided that t 
and 9 are related properly, that is, t = n/j, = nV ' n(9). 

Our results use a minor modification of Corollary 3.6 of Hoist [4] which establishes this 
uniform closeness for statistics which are sums over the data as described below. We use 
the following notation. By C{S n ]9) we mean the unconditional distribution of S n under 
the model with true parameter 9. By C(S„\T n = t) we mean the conditional distribution 
of S n given T„ — t. We use the symbol =>■ to denote convergence in distribution (weak 
convergence) and C(W) and similar notation for limiting distributions. Our version of 
Hoist's results is: 

Lemma 3. Assume Condition D. Suppose that «„(■; ■) is a sequence of measurable func- 
tions mapping fl x O to K m . Let 



ri B (4>) ee E e {exp(i,//T r )} = exp[r{ K (# + i0) - k(9)}} 



is integrable for all 9 £ T and 



sup / \r)e(<f))\ d0 < cc. 



SUp SUp f r (t,9) < CO. 

0er t£R k 



n 



S n (9) 



= n 



1 ^J2K(X i ,9)-E e {u n (X i ,e)}]. 



(3) 



8=1 
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Assume that for any deterministic sequence 9 n of parameter values converging to some 
9 £ int(O) the joint law 

CeJS n (e n ),n- 1 ' 2 {T n -n l i{e n )}) 

converges to multivariate normal with mean and variance-covariance matrix of the 
form 

- A{9) B{9) 
B'{9) V(6) 

which may depend on 9 but not on the specific sequence 9 n . Then with S n denoting S n (9 n ) 
we have for every such sequence 9 n 

C(S n \T n = t n ) = G n (-\nfi(e n )) MVN(0, A{9) - B(9)V- 1 (9)B' (6)), 

where t n — nfi(9 n ). Moreover, for every compact subset T o/int(0) we have 

lim sup sup \G n (x\nfi) — H n (x\9)\ = 0. 

n-i-oo _ 00 < 2 ;< 00 g e p 

The condition involving the sequence 9 n amounts to requiring that the central limit 
theorem apply uniformly on compact subsets of 0. Our main result extends the last 
conclusion of the lemma to statistics of the Cramcr-von Mises type for the case where f2 
is the real line; see Remark 8 in Section 5 for discussion of more general sample spaces. 

Theorem 1. Suppose S n is as defined in (2). Suppose the weight ip is continuous on 
[0,1]. Assume Condition D. Then for every compact subset T o/int(0) we have 

lim sup sup \G n {x\n^) — H n (x\9)\ = 0. (4) 

The theorem asserts that two distribution functions, one conditional, the other uncon- 
ditional, are close together everywhere and simultaneously for all 9 belonging to some 
compact set. In the Introduction, we described our results in terms of P-values; we now 
recast the theorem in those terms. The conditional P value, now denoted P c , n , is 

Pc, n = 1 Gn^SnlTn) . 

The unconditional P value, P u , n , is 

Pu,n = 1 H n i^S n , 9 n *) . 

We then have the following result which also clarifies the sampling properties of the 
distributions G n and H n evaluated at sample estimates. 

Theorem 2. Assume the conditions of Theorem 1. 
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(a) If X\, X2, ■ ■ ■ is an i.i.d. sequence generated from the model with true parameter 
value 9 € int(O) (i.e., if the null hypothesis is true and the true parameter value is 
not on the boundary of the parameter space), then 

lim sup sup |G n (a;|T„) — H n (x\6 n )\ = almost surely 

and 

Pen — Pu.n almost surely. 

(b) Suppose X\,X2, ... is an i.i.d. sequence generated from some fixed alternative dis- 
tribution. Suppose that for this alternative E(Ti) = fi a exists and is in the open set 
yu(int(0)), that is, the image of the interior of under the map 6 1— > /i. Then both 
conclusions of part (a) still hold. In particular, if one test is consistent against the 
alternative then so is the other. 



Details of proofs are in Section 6 but here we outline the strategy of proof for our 
Theorem 1. Fix a complete orthonormal system of functions gj defined on [0,1]; for 
definiteness we take gj(s) = y/2sin.(njs). Define 

U n ,j= ip(s)W n (s)g.j(s)ds. 
Jo 

Then by Parseval's identity 

00 

S n = U%j. 

The proof then has the following steps: 

1. The sequence of distribution functions H n (-\0) converges weakly to a limiting dis- 
tribution function i? oo (-|0); the convergence is uniform on compact subsets of 0. 
The distribution in question is the law of 

/ ^{s)W 2 (s)A s ^Y. U ^.r 

where we define 

U ooJ = [ ^(s)W(s) 9j (s)ds. 
Jo 

This reduces the problem to proving that the sequence G n {•\n/jL) converges uniformly 
on compact subsets of int(O) to Hoo(-\6) where fi = Vk(8). 

2. Uniform convergence is established by considering an arbitrary sequence 9 n of pa- 
rameter values converging to some 9 € int(O) and showing that, with fj, n — Vk(0„), 

lim sup \G n (x\rifjLn) -Hoo(x;9)\=0. (5) 

n ^°° — oo<x<oo 
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3. Apply standard weak convergence ideas to see that for each K fixed 

£((U n ,l, ■ ■ ■ , U nt K )',0 n ) =$> £((E/oo,l, . . . , f/oo.-K"))- 

4. Use Hoist's results to prove that 

£((£/■„,!, . . . , U n>K )\T n = n Vn) => £((?7oo,i, • • • , U^k)); 

this is the same joint limit law as in the previous step. 

5. Prove the sequence £(S' n |T rl = n/j, n ) of conditional distributions of S„ is tight. 

6. Prove that there is a sequence K n tending to infinity sufficiently slowly that 



ip 2 (s)W 2 (s)ds 



7. Prove the corresponding conditional result given T„ = n\x n . 

8. Prove that for any sequence K n tending to infinity 



I 2 



converges to in probability given T„ = n\i n . 
9. Apply Slutsky's theorem to 6, 7 and 8 and use S n — Y^JLi U 2 ,j to see 



£(S n \T n = nii n )=>c(J 



ip 2 (s)W 2 (s)ds 



which establishes (5) and completes the proof. 



2.2. Unconditional limits 

We now consider the random function Y n (t) — ip(t)W n (t) and review some well known 
facts about the unconditional limiting distributions of the processes Y n ; see Shorack and 
Wellner [12], for example. If 9 n converges to 0, then the unconditional laws of Y n converge 
weakly in D[0, 1] to the law of a Gaussian process Y with mean and covariance 

(e(s,t) = ip(s)pe(s,t)ip(t). 

The covariance £g is square integrable over the unit square; it is convenient to suppress 9 
in the notation for what follows. There is a sequence of bounded continuous orthonormal 
eigenfunctions ^j-(t), j = 1,2, . . . , with corresponding eigenvalues Xj such that 



[ C(s,t)Xj(t)dt = X jXj (s). 
Jo 
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Then 



3 

JO 

where 



/V 2 (t)dt = ^A J Z|, (6) 

JO 



Z 3 =K 1/2 / Y{t) Xj {t)At. 



The Zj are independent standard normal. Let -Hoo(-;#) denote the cumulative distribu- 
tion of (6). It is then standard that 

lim sup sup \H n (x\ 9) — H OQ (x; 9)\ = 0. 

Our main result will therefore follow if we establish (5). 

Next, recall that W n converges weakly to the Gaussian process W with covariance 
function pg. The map 

f^(J f(s)iKs) gi (s)d8,...,J f(s)^(s)g K ( S )ds 

is continuous from D[0, 1] to H K so that 

(U n ,l, ■ ■ ■ , U n J<) (UooS, . . ., Un^x)- 

This limit vector has a multivariate normal distribution with mean and covariance 

Cov(U 00ti ,U 00 j)= f f g l (s)g J (t)( e (s,t)dsdt. (7) 
Jo Jo 

It follows that 

K K 

E^^E^r 

3=1 3=1 

Since 



„1 oo 

/ i) 2 {s)W 2 (s)ds = Y j U, 
Jo , =1 



2 

3- ' 



almost surely we have, for any sequence K n tending to infinity, that 

k„ 



E^W 

3=1 J ° 



il?{s)W 2 {s)ds. 



This completes the analysis of the unconditional limit behaviour of S n . The next 
subsection considers the conditional limit behaviour. 
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2.3. Convergence of finite dimensional distributions — 
conditional case 

In the following, all distributional assertions are statements about the conditional distri- 
bution of the objects involved given T n = n\i n for a specific sequence 9 n converging to 
some 6 G V and fi„ =S7K(6 n ). We apply Lemma 3 as follows. We have 

„1 n 

U n>j = / mW n (t)g 3 (t)dt = n- 1 / 2 J2$ jn (X l ), 

where 

$ jn (x) = f [l{F(x; 6„) <t}- #(t)ffj(t) dt. 
Jo 

It follows from Lemma 3 that 

£(([/•„,!,..., U n , K )\ )=>C((U 00 , 1 ,...,U 00 , K )). 

The vector (J7oo,i, • • • , foo,Jc) has a multivariate normal distribution with mean and 
variance covariance matrix with entries as at (7). This is the same limit behaviour as in 
the unconditional case. Thus, 



k \ K 

J 2 

00,3 

\j=l / 3=1 



Again this is the same weak limit as in the previous section. Finally, since convergence 
in distribution is metrizable there is a sequence K n tending to infinity so slowly that 

/ K„ \ oo 

\j=l / 3=1 

We need only show, therefore, that for any sequence K n tending to infinity we have, 
conditionally on T„ = n\x n , 

oo 

£ U n,3^° 
3=K n +l 

in probability. It suffices to show that 

E[ jr Ul^T n = r H i n j ^0. (8) 

\3=K n + l ) 

We will prove this from the following statements. First, we will show that for each fixed j 

E(t^,|T„ = n^)->E{^, ,}. (9) 
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This shows that for each fixed K we have 

/ k 



E E^ 



: nfJtf, 



Finally, we will show that 



(10) 



E 5>. 



E ( E ^ 



2 



(11) 



Assertion (8) is a straightforward consequence of (9) and (11). It is now straightforward 
to apply Slutsky's theorem to complete the proof of the main theorem. 

Statements (9) and (11) are proved in Section 6. The proofs relate C(U% j\T n = n^ n ) 
to an integral involving 

E(1(X ?; < x)l{X k < y)\T n = n/i„) 

and other similar Rao-Blackwell estimates. They then use a conditional Edgeworth ex- 
pansion of Rao-Blackwell estimates which is of some interest in its own right. We describe 
these expansions in the next section. 



3. Conditional Edgeworth expansions 

In this section, we compute the first term in an Edgeworth expansion of the conditional 
expectation of a function of X\, . . , , X m given T„ . We will focus on uniformity, extend- 
ing the work of Portnoy [11]. The calculations may be interpreted as a computation of 
the difference, to order 1/n, between a Rao-Blackwell estimate of a parameter and the 
maximum likelihood estimate. 

Our results use the Edgeworth expansion of the density of T„. Assuming Condition D, 
for n>r the quantity {T„ — n/i(0)}/ v / n has a density q n (-;8). The following lemma 
is essentially a uniform version of Theorem 19.2 in Bhattacharya and Ranga Rao [2]; 
see Hoist [4], Yuan and Clarke [15]. It extends a lemma appearing in Lockhart and 
O'Reilly [6]. Let u denote a k vector with entries Ui,...,itfc. 

Lemma 4. Assume Condition D. Then there are functions 

^•(u;0), j — 1,2, ... , 

and 

tp jk (u;6), fc = 0,...,i + 2, 

such that 
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1. ipjk is homogeneous of degree k as a function of u\, . . . , Uk- That is 
4> jk (ui,...,u k ;9)= ^2 a j k:i 1 -i k {0)u ll ■ ■ - u lk 

ii—ih 

for some coefficients ajk-i 1 ---i k (0) not depending on u. 

2. If j — k is odd, then ipjk = 0. 

3. ipj is a polynomial of degree j + 2 as a function of u given by 

j+2 
fe=0 

4. The coefficients ajk-i 1 ---i k (0) in these polynomials are smooth functions of 8. 

5. Fix an integer s > and a compact subset T of int(0). let <j)(u,V) be the multi- 
variate normal density with mean and covariance matrix V . Then 

We will use this lemma with s = 3 to get an error rate on our 1 term expansion. We 
need the following notation. Define 

m 

B m (xi, . ..,x m ) = ^T{T(xi) - n} 

i=i 

and let B rn denote the random vector 

B m {Xi,. . .,X m ) — T m — mix. 

Let D = V~ x be the inverse of the variance covariance matrix V. The lowest degree term 
in the polynomial ip\ has the form 

k 

V>i,i(u) =-J~]ai,i;e(0)ue 
i 

where, from Bhattacharya and Ranga Rao [2], page 55, we have 

ai,i;£ = y~]KiiiDuDu/2 +} j K ii j(2DijDit + DaDu)/2 

i i=tj 

+ ^2 K ijk(DijDke + DikDji + DjkDu). 

i<j<k 

If J(xi , . . . , x m ) is a real valued measurable function on f2 m ; we let J = J{X\ , . . . , X m ). 
Remember in the following that /_t and 6 are related through Ee(T„) = nfj,. 



: sup sup 
eer u 



q n (u;9)-cl ) {u,V(9)}{l 



E 

.7=1 



nil 2 
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Theorem 3. Fix an integer m > 0. Suppose J > is a real valued measurable function 
on n m such that 

Eg{J(X U ...,X m )}<00 

for all 6 £ int(O). Then for each compact subset T of int(O) we have 

limsupn 2 sup sup |E{J|T„ = nfi} — A(n, J, 8) \ < oo, (12) 

n— >ao eer J 



where 



A(n,J,6) = Eg(3)-' H(J ' 



n 



R(J,9) = ^Eg(J) - ^Eg{3B' m V- 1 (6)B m } - E e {J^i.i(B m )} (13) 
= V 2 E e (J)+V^i,i{VE e (J)}. (14) 

The supremum over J is over all measurable J defined on fl m with | J| < J (almost 
everywhere). Moreover, 

sup sup R(J, 6) = 0(1). (15) 
eer j : \j\<j 

In (14), the symbols V and V 2 are as in Lemma 1. It is part of the theorem that the 
quantities on the right in (13) and (14) are equal. 

4. Examples 

In this section, we consider the Gamma and von Mises models and show that the theory 
of the previous sections applies. These two models were considered in Lockhart, O'Reilly 
and Stephens [7, 8] where Gibbs sampling was used to implement the conditional tests 
discussed here via Markov Chain Monte Carlo. In the case of the Gamma distribution, we 
also illustrate the use of the expansion of the Rao-Blackwell estimate by giving a formula 
for an approximate Rao-Blackwell estimate of the shape parameter. 

4.1. The Gamma distribution 

Suppose Xi,X2, . . . are i.i.d. with density 

f& a >P) = -^(f) cxp(-x//3)l(x>0). 

We take 6>i = a, 6 2 = 1//3 and 9 = {6 : 6 X > 0, 6 2 > 0}. Wc then have 

T(x) = (\og(x),-x) 
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and 

m+0i) 



«#l,0 2 ) = l0| 

The characteristic function of T is 



r(^ + %) (6> 2 + i0 2 )^ +i ^ 



r(0i) ^ 

Fix a compact set T in the parameter space and let 

e = in£{9 1 : 36 2 : (61,62) € T}. 

In Section 6, we use properties of the Gamma function in the complex plane to show 
that for r so large that re > 2 and r > 4 we have 

sup /"|*(0 1)( fe))| r d(M02<oo. (16) 

This establishes Condition D in this case. 

For completeness, we record here the functions needed to apply Theorem 3 to this 
family. Let tp(9) = dlogT (6) / d9 denote the digamma function and let ip' and denote 
its first and second derivatives. Let 6 = 6\ip'(6{) — 1. Then we find 



/'l 




log(0 2 ), fi 2 = -6i/6 2 , 


Vu 


= i>'{6i), 


V12 = V21 = l/6 2 , 




= 61 /el 


Di2 = D 2 i = 6 2 /5, 


D u 


= 61/6, 


D 22 = 6t^(6i)/5, 


Kill 


=r(6i), 


Kll2 = KlU = K 2 ii = 0, 


K-222 


= -261/6 


2; K122 = K221 = K221 = 1/^2 > 



6\^"(6i) + 26 2 i/j'(6i) + 2 _ Orf'tfi) + 29 2 W(9i)} 2 + 20 2 ^'(0i) 
011:1 _ 2^ ' flll;2 ~ ' 

These formulas may be used to give approximations in terms of the maximum likelihood 
estimate 9 to order 1/n of the Rao-Blackwell estimate of a parameter. As an example, 
we consider the approximation to the Rao-Blackwell estimate of the shape parameter 9i . 
In this case Eg(J) = 9i so the Hessian matrix in R(J,9) is and the gradient is simply 
(1,0)'. Our approximation from (14) is then 

- * ^i,i(l,0) * , e^"{9 1 )+2§^"0 1 ) + 2 

fl = f 1 = (71 H . 

n 2n{6iip'(6 1 )-l} 2 

Remark. I do not know if there is, for some value of m, an unbiased estimate of 61. 
That is, I do not know if J exists in the calculation just given. It seems worth noting 
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that the expansion can be computed anyway since the terms therein depend only on the 
function of the parameters which is being estimated and the derivatives of that function. 

4.2. The von Mises distribution 

Suppose Xi,X 2 , ■ ■ ■ are i.i.d. with density 

f(x;a,x ) = - } , r cxp{acos(a; - x )}l(0 < x < 2n), 
2nl (a) 

where Iq is the modified Bcsscl function of the first kind of order 0. We take 9\ = acos(xo), 
02 = asin(a;o) and 6 = R 2 . We then have 

T(x) = (cos(x),sin(x)). 

Here we find it easier to verify Condition D*. For a sample of size m the density of the 
sufficient statistics is known analytically in the case 0\= 62 = 0, that is, when the distribu- 
tion is uniform on the interval (0,27t). Write T m in polar coordinates as (i?cosi5, Rsind) 
with the angle i5 in [0, 2n) and R= \\T r \\ ; then R and 6 are independent. The distribution 
of 5 is uniform on [0,27t). From Stephens [13], we find R has the density 

/>oo 

f m (u)=u J (ut)J™(t)tdt, 
Jo 

where Jo is the Bessel function of the first kind of order 0. The function Jo(t) is bounded 
and decays at infinity like i -1 ' 2 . So for all to > 4 there is a constant C m such that 

f m (u) < C m U 

for all u > 0. The density f m vanishes for negative u and for u> m. Change variables to 
see that for all to > 5 the density of T m is bounded by C m j (2tc). For = (#1, $2) not 
the likelihood ratio of 9 to is exp(6»'T m )/J ^(||6»||). Since the density of T m for 9 is the 
density for multiplied by the likelihood ratio Condition D* holds with r = 5. 

5. Discussion 

We conclude with a series of remarks. 

Remark 1 . For a given goodncss-of-fit test statistic we may compute P- values in several 
ways. The parametric bootstrap technique proceeds by estimating the unknown param- 
eters and then generating a large number of samples from the hypothesized distribution 
using the estimated value of the parameters. Except in location-scale models the result- 
ing tests are approximate; that is, the distribution of the P-value is not exactly uniform 
though it becomes more so as the sample size increases. 

An alternative technique is to compute a conditional P value using 



P(S n > s|T„) 
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evaluated at s equal to the observed value of S„ . This P- value must generally be evaluated 
by Monte Carlo methods. For some distributions, such as the Inverse Gaussian, there is 
a direct way to simulate samples from the conditional distribution of the data given T„ . 
See O'Reilly and Gracia-Mcdrano [10]. For other distributions, Markov Chain Monte 
Carlo may be used; see Lockhart, O'Reilly and Stephens [7, 8]. 

If the null hypothesis is true 1 and the true value of 9 is in int(G), then we have shown 
that the difference between these two P-values converges almost surely to 0. In our expe- 
rience, these two P-values are usually extremely close together suggesting the agreement 
extends to some higher order expansion; I do not know how to show such a thing. 

Remark 2. Indeed this equivalence of P-values requires only a large sample size and 
an estimate 9 not too close to the boundary of O. It is not at all necessary that the 
null hypothesis be true. Of course if the null hypothesis is not true the estimate 9 n 
could converge to the boundary of the parameter space and then our results permit the 
P-values to be different even in large samples. 

Remark 3. For fixed alternatives, our results imply that the difference in powers be- 
tween the two tests tends to except when T n /n does not have a limit in /x(int(0)). The 
conclusions in Theorem 2 can be extended to contiguous sequences of alternatives yielding 
conclusions that the two tests have identical limiting powers along such sequences. 

Remark 4- The local central limit theorem for lattice distributions may be used to 
prove the equivalent of Theorem 1 if T(x) takes values in a lattice and the data are 
discrete. 

Remark 5. The result also extends to a variety of other statistics such as 



Jo Jo 

or any other suitable quadratic form in the process W n , under regularity conditions on 
the weight functions ip, the kernel K , or the quadratic form. 

Remark 6. One important case not covered by our proof is the Anderson-Darling test 
which is of the Cramer-von Mises type but with weight function 



which is not square integrable. It may be possible to verify our assertions (9) and (11) 
by more careful analysis of the conditional moments of W n near the ends of the unit 
interval. 




or 




</>(*) = i/\A(i-s) 
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Remark 7. Our proofs show that the Edgeworth expansion to order 2s given in Lemma 4 
may be used to provide an expansion of any Rao-Blackwell estimate about the maximum 
likelihood estimate of Eg (J) in inverse powers of n out to terms of order n~ s with 
a remainder which is 0(n~( s+1 ') uniformly on compact subsets of int(6). We have not 
done the algebra for any s > 1 but we can state the following theorem. 

Theorem 4. Under the conditions of Theorem 3, there are junctions Rj(J,9) for j 
1.2..., such that for any integer s > 1 we have 

limsupn, 1+s sup sup |E{J|T n = n/i} - A s (n, J,9)\ < oo, (17) 



where 



A s (n,J,6)^Eo(3)+j2^J 



3=1 

The functions Rj are computed using Taylor expansions as in Theorem 3 and collecting 
terms in inverse powers of n. Each Ri is bounded uniformly over 6 G T and \ J\ < J. 

Of course i?i is just R of Theorem 3 and the point is that the arguments in the proof 
of that theorem can be applied to all remainder terms occurring here. 

Remark 8. In Theorem 1, the Xi are real valued; this is needed only for the weak 
convergence results. In the von Mises case, for instance, it is useful to regard the obser- 
vation Xi not as an angle but as a unit vector Xi as was suggested in the introduction. 
This makes T ra = Y]Xj. In many examples, the Xi can usefully be taken to be multivari- 
ate. Our results may be expected to extend to any statistic admitting a sum of squares 
expansion like that of Cramer-von Mises statistics. 

Remark 9. The conditional tests described here have level identically equal to a. In 
the introduction, we noted that this is a necessary condition for an unbiased level a test 
in models with a complete sufficient statistic. Though necessary, the condition is not 
sufficient; we do not know how to check that a given conditional test is unbiased, nor 
how to establish any optimal power properties for the tests considered here. 



6. Proofs 

6.1. Proof of Lemma 3 

The proof in Hoist [4] of his Corollary 3.6 extends directly to prove this lemma. However, 
Hoist's Corollary 3.6 assumes "the general conditions" of his Section 2. In particular, we 
must verify the integrability hypothesis of his Proposition 2.1 which we now describe in 
our notation. Let 



*r,e(Ci,C2) =E 9 {exp(i£S r (0) +i£T r )} 
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be the joint characteristic function of S r (9),T r . Hoist requires that for each £1 and each 
compact subset T of int(O) there is an r > such that for all 9 £ T 



Lemma 5. Condition D implies (18). In fact, r can be chosen free o/£i. 

This is an easy consequence of the following lemma. 

Lemma 6. Suppose X £ W 1 and Y £ W 71 have joint distribution F(dx,dy) and joint 
characteristic function ip(u,v). Then 

1. IfY has density f bounded by M and ip is real valued and nonnegative, then 



Proof. Statement 3 is a well-known consequence of the Fourier inversion formula. State- 
ment 2 follows from Statement 1 by symmetrization: if the pair (X*,Y*) has the same 
joint distribution as (X, Y) and is independent of (X, Y) then the second statement is 
the first applied to (X — X*,Y — Y*) noting that Y — Y* has a density also bounded 
by M. 

To prove Statement 1, we follow Feller [3], pages 480ff. Let £ denote the standard nor- 
mal density in K m . Then for each a > the function a£(ax) is a density with characteristic 
function (2n) m / 2 £(u/a). 




(18) 




2. If Y has density f bounded by M, then 






a m £(av)e iu x exp{h/(y - C)}^(dx, dy) dv 




At ( = 0, we get 





Now let a — > to get Statement 1. 



□ 



20 



R.A. Lockhart 



6.2. Proof of Theorem 3 



We use the shorthands x for the vector (xi, . . . ,x m ) and dx for /i(dxi) ■ • ■ /i(dx m ). Let f m 
be the joint density of X\, . . . , X m ; we suppress the dependence of this density on 9. For 
n > r, we let q n denote the density of (T„ — n\i)j\fn again suppressing the dependence 
on 9. (Densities of sufficient statistics are relative to Lebesgue measure while those of the 
data are relative to products of the carrier measure //.) We adopt the useful notation 

Q m = B' m V~ 1 B m , Q m n = Q m /n and q*(x) =q n (x)/<j>(0,V). 

It is elementary that 



E{J|T„ = n/i} = 



k/2 



J(x)/ m (x) 



Qn — m (A m ) 



where 



-4, 



A m (x) 



9n(0) 



dx, 



The quantity in (12) may be written as h I%\ where h = J <^( x )/m( x ) T i( x ) dx for 

suitable functions Ti,...,Ts. We will argue below that each integral is 0(n~ 2 ) uniformly 
in 9 over compact subsets T of int(O). The functions tj are given by 



n(u) 

T 2 (U) 
73 (U) 
T 4 (u) 



fc / 2 g„_ m (A m ) - <t>(A m , V){1 + Ej=i ^(An)/(n - ™) j/2 } 



9n(0) 



= 1 



= 1 



r 5 (uj 
r e (u) 

T 7 (U) 



= 1 



mfc 
~2rT 

mfc 
~2n 

mk 
~2n 

mk 
~2n 

mk 
~2n 

mk 
~2n 



k/2 



( mk\ \ ^(4, V){1 + Ej = i V»j(^m)/(Tt- mp/ 2 } 



1 



£(0) 
1 



n 



1 - 



^2,0 

^2,0 
/) 

n 

^2,0 

n 



1 - 



^2,0 



>/2 



(-^m) 

^— ' (n — mW 2 

j=l v ' 



e -Q m „/2 



e -Q m „/2 



E 
E 



(n — my' 2 

n e ' 2 (n — m)i/ 2 J ' 
1 



»„/2 



■02,0 



1 



\/n(n — m) J ' 
1 ' 



n — m n 



,/2 _ 1 + 



2n 



V ; 2,o - ipi,i(B m ) 
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r 8 (u) = 1 



where 



^2,0 



T 9 (U) = 1 + 



Qm 

~2n 



"02,0 - V'l.lC-Sm) 



r 9 (u), 



mfc/2-Q m /2-Vi,i(B m ) 



Theorem 1 will follow if we show for i = 1 , . . . , 8 that 

sup sup | | =0(n~~ 2 ). 
\j\<jeer 

These 8 assertions may be established using several bounds. We do not give complete 
details since the arguments are routine but we illustrate some of the details. For instance, 
it is elementary that 



f^_\ k/ \ {m + l)k/2 and (_!L-) 

\n — m J \n — m J 

Continuity and compactness imply 



fe/2 



sup sup 

eer x 



^0) 

nil' 1 



and 



Lemma 1 guarantees that 



inf 6(0,V) >0. 
eer 



lim inf inf q n (0) > 



< oo 



and so with e„ as in Lemma 4 we have 

|Ji|<(m+l) fc / 2 e n supE e (J)/inf q„(0). 



9er 



eer 



For I2, I5 and 1$ use the elementary facts that 



1 



1 



n — m n 



0(n" 2 ) and 



\Jn{n — m) n 



0(n~ 2 ). 



Integral I3 is bounded using Lemma 1 again. Integral I4 uses the powers of n in the 
displayed sum. For I7 use the inequalities < e~ x — 1 + x < x 2 /2 to see that 
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These bounds apply to the integrands; they are used to bound the integrals based on 
the following observation. The condition that J have finite expectation for all 9 in int(0) 
means that J(x)/ m (x)/Ee(J) defines another exponential family with natural parameter 
space including int(O). This permits differentiation under the integral sign with respect 
to 9 as many times as desired. It is then easily established that for all a > 

supE e (||T r || Q J) <oo. 
eer 

This permits all the bounds derived above to be integrated against J(x)/ m (x) to establish 
the desired conclusion. 

Differentiation under the integral sign permits us to show for any J with | J\ < J the 
following two identities: 

VE e (J) = Cov e (J,T TO ), 

V 2 E e (J) = Cov e (J,B m B™) 

= E e (JB m B' m )-E 9 (J)K 

From these two identities, we deduce 

E e (JB^F- x B m ) = tracejE^JB^J^- 1 } 

= tracc-^E^J)^ 1 } + E e (J) trace(V _ 1 V). 

This and the observation that is a linear function establish the equivalence of the 
two forms of R(J,9) in (13) and (14). 

6.3. Proof of assertions (9) and (11) 

We must prove 

E[l% j \T n = nn n ]-> / / ip(s)ip(t) 9j (s) gj (t)p (s,t)dsdt 
Jo Jo 

and 

E[5„|T n = nn n } ->• / tp 2 (s)pg(s, s)ds. 
Jo 

To this end, define 

F{u\n)=E{l{X 1 <x)\T n = n fJ ,}, 

where u is related to x by u = F(x,9). Then F(u\T n /n) is the Rao-Blackwell estimate 
of F(x, 9). Also define ui = F(x{,9) for i = l,2 and 

F( Ul ,u 2 \p) = E{1(X X < x u X 2 < x 2 )\T n =nn}. 
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Then F 2 (ui 7 u 2 \T n /n) is the Rao- Blackwell estimate of F(xi,9)F(x 2 ,6) (the uncondi- 
tional joint cumulative distribution function of X\ and X 2 ). 
Define 

pn(ui,u 2 \fi) =E{W n (ui)W n (u 2 )\T n = np}. 

We then have 



E 



Y n {t)g 3 {t)dt\ \T n = np. 



1 r i 



Jo 



^{s)^{t)9j (s)gj (t)p n (s, t\p) ds dt. 



Direct calculation shows that 

p„(u 1 ,u 2 \n) = F(mm(ui,u 2 )\p) - F(ui\p)u 2 - F(u 2 \p)ui + u x u 2 

+ (n - l){F 2 (ui,u 2 \fi) - F(ux\fx)u 2 - F(u 2 |//)?ii + u 1 u 2 } 
= F(mm(«i,« 2 )|M) - FM/x^Mi") (19) 
+ (n- 1){F 2 (ui,w 2 |m) - ^MaO^M^)} 

+ n{F{ Ul \p) - Ui }{F(m 2 | M ) - u 2 }. (20) 

We will establish (9) by proving 

p n (ui,u 2 \n) -*pe(ui,VQ) (21) 

uniformly in u\ and it 2 . We apply Theorem 3. Take J= 1, Ji(Xl,X 2 ) = l(Xi < xi), 
J 2 (Xi) = l(Jfi < x 2 ) and J 3 (Xi,X 2 ) = l(Xi < x x , X 2 < x 2 ). (The odd looking indexes 
in J 2 are deliberate. The algebra involved in simplifying the remainder terms is easier if 
we take m = 2 for J3 and m = 1 for Ji and J2.) We find from (15) applied to J\ and J2 
that the term (20) converges to uniformly in u\ and u 2 . Applying (15) to J\ shows 
that the term (19) converges, uniformly in u\ and u 2 , to 

min(ui, u 2 ) — U\U 2 . 

Finally from (12), we find that 

{n-l){F 2 {u u u 2 \p)-F{ Ul \p)F{u 2 \p)} 

converges to 

A(n,J 3 ,6)-A(n,J u 9)A(n,J 2 ,6) 

uniformly in u\,u 2 . Adopt the temporary notation Ri = R(Ji,9) and A4 = A(n, Ji, 6) for 
i = 1,2,3. Then 

n(A 3 - AiA 3 ) = R S - RiE e (J 2 ) - R 2 Eg(J 1 ) + R 1 R 2 /n. (22) 
From (15), we see that R\R 2 jn converges to uniformly in m, u 2 , x\ and x 2 . 
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Computing we get 

R 3 = kE e (3i)E e (32) - iE e (J 3 B 2 ^- 1 B 2 ) + E e (J 3 ^i,i(B 2 )), 
R 1 E e (J 2 ) = ^E fl (Ji)E 9 (J 3 ) - ^(JiBiV^Bi) +E fl (J 1 ^,i(B 1 )), 

R 2 E e (J 1 ) = ^E fl (Ji)E 9 (J 3 ) - ^E e (3 2 B 1 V- 1 B 1 ) +E fl (J 2 Vi,i(Bi)). 

Since B 2 is a sum of two independent terms we expand the quadratic form in R3 to see 

E 9 (J 3 B 2 y- 1 B 2 ) = E e (JiB 1 F- 1 B 1 )E e (J 2 ) 

+ E^JaBiV-^iJE^Ji) + 2E (J 1 B' 1 )y- 1 E e (B 1 J 2 ). 

We may also use the linearity of fax and the independence of X\ and X 2 to see that 

E e (J 3 ^i,i(B 2 )) = E fl (Ji^i,i(B 1 ))E e (J 2 ) + E e (J 2 Vi,i(Bi))E e (J 1 ). 

Thus, R 3 - i?iE e (J 2 ) - i? 2 E 9 (Ji) simplifies to -E e (JiBi)V r - 1 E e (B 1 J 2 ). Since V is the 
Fisher information matrix in this problem, we have established (9). To check (11), we 
make a very similar calculation. 



6.4. Verification of Condition D for the Gamma family 

Here, we establish (16). Change variables via u = <fi 2 /6 2 to show the integral in (16) is 
proportional to B r 2 \ thus we take 6 2 = 1 without loss. The integral becomes: 



sup 



r(0i+tyi; 



r(0i 



1 



(1 + 02)^1/2 



exp{r(f>i tan 1 fa} d(f>i dfa 



The substitution fa = tan(it) reduces the integral to 



sup 



/>DO /*7t/2 




1 -00 J—rt/2 





cos 



rSi-2 



(it) e~>q>(rfau) du dfa 



We integrate separately over 4 ranges: R\ = {—M <fa< M}, R 2 = {\4>i \ > M, fau < 0}, 
R3 = {fa > M,u > 0} and R 4 = {fa < -M,u< 0}. Since \T(6i+ifa)\ = |r(0 x - ifa)\ the 
integrals i? 3 and R4 are equal. Over R\ we use the inequality 

(because the quantity inside the modulus signs is the characteristic function of log(Xi)) 
to get the bound, for 0\ > e with re > 2 



Ri 



r(0i 



101 



,r6i-2 



(u)e r<l,lU dudfa < Mexp{Mra/2} 
< nM exp{Mra/2}. 



n/2 



-7l/2 



-,re-2 



(u) du 
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Over R2 the term exp(r(j>iu) is bounded by 1. Thus, 

r(0i + i<£i) 



cos rei - 2 (u)e r01 "dud</)i <7t / 

J — ( 



/■OO 


r(0i + i0i) 


/ — 00 


r(0i) 



The integral is bounded by the supremum of the density of log(Xi) over the real line 
and the compact parameter set T. 

Finally, we consider the integral over i?3. From Section 6.1.45 of Abramowitz and 
Stegun [1], we find there is a constant C such that 



r(0i+tyi) 



m) 



7t0i/2j0i-l/2 



< Cc- net ' l/2 (j> 1 



For 6>i < 1/2, 0i > M > 1 and re > 2 we then get 



It:: 



r(0i + i&) 



r(fi) 



7t/2 



(u)e r * lU dud<^i < C / / e-^^-^cos^-^ujd^idu 



J M 
Tt/2 CQS r 9l -2 (u) 

"/ 2 sin^- 2 ( M ) 



(b( 



< C 
= C 

< c 

< — / M re_3 du < 00. 



sin re -"(it) 



da 



/■(/ 



For 0i > 1/2 we get 



r(0i+i0i) 



r(0i) 

7t/2 pOO 



cos 1 2 (u) exp(r^iu) diid^i 



e -*iK«/2-«)^(^-V2) cos^-2(„) d0i du 



r(l+r(0i-l/2)) r 72 sin rei - 2 (w) 



R 3 

<c 
<c 

- C r l+r(ei-l/2) 

For r > 5 the right hand side is uniformly bounded over r n {0i > 1/2}. 



r l+r(8i-l/2) 



,r(0i-l/2)+l 



du 
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